practice category
Liu
Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.
Quality Assurance Challenges for Machine Learning Software Applications During Software Development Life Cycle Phases
Alamin, Md Abdullah Al, Uddin, Gias
In the past decades, the revolutionary advances of Machine Learning (ML) have shown a rapid adoption of ML models into software systems of diverse types. Such Machine Learning Software Applications (MLSAs) are gaining importance in our daily lives. As such, the Quality Assurance (QA) of MLSAs is of paramount importance. Several research efforts are dedicated to determining the specific challenges we can face while adopting ML models into software systems. However, we are aware of no research that offered a holistic view of the distribution of those ML quality assurance challenges across the various phases of software development life cycles (SDLC). This paper conducts an in-depth literature review of a large volume of research papers that focused on the quality assurance of ML models. We developed a taxonomy of MLSA quality assurance issues by mapping the various ML adoption challenges across different phases of SDLC. We provide recommendations and research opportunities to improve SDLC practices based on the taxonomy. This mapping can help prioritize quality assurance efforts of MLSAs where the adoption of ML models can be considered crucial.